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AUTOMATED METHODS FOR SIMULATING A BIOLOGICAL NETWORK 

[0001] This application claims priority under 35 USC 1 19(e) to United States provisional 
application Serial No. 60/247,174 filed November 10, 2000, the entire contents of which are 
incorporated herein by reference. 

[0002] This invention was made in part with government support under Contract No. 
N00014-97-1-0422 awarded by the Office of Naval Research. The government has certain 
rights in this invention. 

[0003] The computer listing files contained on the compact discs filed along with this 
specification are incorporated by reference in their entirety. The computer listing files 
contained on the compact disks include the following: Cellerator Pallete.nb, 30kb, created 
11/7/01; Cellerator Pallete.txt, 30kb, created 11/7/01; cellatorl.nb, 1023kb, created 11/6/01; 
Celleratorl.txt, 1023kb, created 11/6/01; cellerator2.nb, 944kb, created 11/6/01; 
Cellerator2.txt, 944kb, created 11/6/01; cellerator3.nb, 371kb, created 11/6/01; and 
Cellerator3.txt, 371kb, created 1 1/6/01. The 2 compact discs filed herewith are identical in 
contents, each containing the above-mentioned files. 

BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 
[0004] The invention relates generally to computer simulations of biological networks. 

BACKGROUND INFORMATION 
[0005] In the past few decades the rapid gain of information about intracellular signal 
transduction and genetic networks has led to the view of regulatory biomolecular circuits as 
highly structured multi-component systems that have evolved to perform optimally in very 
uncertain environments. This emergent complexity of biochemical regulation necessitates 
the development of new tools for analysis, most notably computer assisted mathematical 
models. Computer modeling has proved to be of crucial importance in the analysis of 
genomic DNA sequences and molecular dynamics simulations and is quickly becoming an 
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indispensable tool in biochemical and genetic research. In the past it has been necessary to 
manually translate chemical networks into differential equations and then solve them 
numerically. 

[0006] Several platforms have been developed that enable biologists to do complex 
computational simulations of various aspects of cellular signaling and gene regulatory 
networks. However, these new modeling environments have not been widely utilized in the 
biological research community. Among the reasons for this lack of acceptance is that the 
modeling interface is relative inaccessibility for the typical classically-trained geneticist or 
biochemist. Instead of cartoon representations of signaling pathways in which activation can 
be represented simply by an arrow connecting two molecular species, users are often asked to 
write specific differential equations or choose among different modeling approximations. 
Even for fairly modest biomolecular circuits such a technique would involve explicitly 
writing dozens (or even hundreds) of differential equations, a job that can be tedious, 
difficult, and highly error prone, even for an experienced modeler. Thus, there is a strong 
need for a modeling interface that automatically converts a cartoon- or reaction-based 
biochemical pathway description into a mathematical representation suitable for the solvers 
built into various currently existing software packages, 

[0007] In addition to being more accessible to a broader research community, a tool 
allowing the automatic generation of mathematical models would facilitate the modeling of 
complex networks and interactions. For example, in intracellular signal transduction it is not 
uncommon to find multi-molecular complexes of modifiable proteins. The number of 
different states that a multi-molecular complex, along with the number of equations required 
to fully describe the dynamics of such a system, increases exponentially with the number of 
participating molecules or classes of molecules. One typical complex is a scaffold complex 
involved in MAPK cascades. It is often the case that the dynamics of each state is of interest. 
A modeler then faces the unpleasant, and potentially error prone task, of writing dozens, if 
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not hundreds, of equations. Therefore, there remains a need for automatic equation 
generation tools that can significantly ease this task. 

[0008] Bhalla and Iyengar (Bhalla, U.S., and Iyengar, R., Science 283:38 1-387 (1999); ) 
have noted the need to systematically study interacting pathways with a standardized scheme, 
and have described several networks with mass-action kinetics using the Genesis simulator 
(Bower, J.M., and Beeman.D, The book of Genesis, Springer Verlag, Berlin (1998)). 
However, Bhalla does not disclose a system for automatically generating a series of 
differential equations from a user representation of a biological network. Furthermore, this 
system does not provide the user flexibility to manually intervene and modify differential 
equations before they are solved. Furthermore, these systems are not robust enough to be 
utilized for modeling of virtually any biological network such as those involved in 
developmental systems. 

SUMMARY OF THE INVENTION 
[0009] The present invention provides a general approach to automatic model generation 
(i.e. computer simulation) for the description of biological networks, including dynamic 
biological networks. The methods of the present invention facilitate biological modeling via 
automated equation generation based on the concept of a hierarchy of canonical forms that 
describe biological processes at various levels of detail. At each level of hierarchy two 
classes of canonical forms can be identified: the input canonical form, that is used to supply 
information to the program, and the output canonical form that is produced by a simulator. 
Thus, using the methods of the present invention, a user can input a representation of a 
biological network using a familiar, common biological notation form and the methods 
automatically generate and numerically solve a series of mathematical equations based on the 
representation. 
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[0010] Biological networks and the reactions at the core of these biological networks can 
be loosely classified in order of their biological complexity into the following: simple 
chemical reactions including degradation, enzymatic reactions in solution, multi-molecular 
complexes with a non-trivial number of states (e.g., scaffold proteins), multiple interacting 
and non-overlapping pathways, transcription, translation, intracellular components, transport 
processes and morphogenesis. The methods and systems of the present invention utilize 
general canonical forms that describe these biological networks and the reactions at the heart 
of these biological networks. These canonical forms can be either input forms, such as 
chemical reactions, or output forms, such as differential equations that are automatically 
generated using the methods and systems of the present invention. Furthermore, the present 
specification identifies these canonical forms so that an efficient mapping from the input 
forms to the output forms can be implemented. 

[0011] The methods in certain preferred embodiments include explicit output description 
and flexible user intervention at several steps through the model generation. This design, 
which allows intervention and modification of the model "on the fly" leads to increased 
model design flexibility and provides an immediate error correction mechanism. 
Furthermore, preferred embodiments of the present invention are illustrated which provide 
the modeling of developmental networks using an organism-as-a-graph approach using 
domains and fields. 

[0012] The present invention provides an automated method for simulating a biological 
network. The method includes the following: 

a) receiving initial condition values, process parameters, and a user 
representation of the biological network, wherein the user representation is input 
using one or more of a series of biological network canonical input forms, wherein 
each canonical input form is based on a type of biological process in the biological 
network; 
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b) generating a series of mathematical equations in an equation output 
canonical form based on the input representation of the biological network and the 
process parameters; and 

c) numerically solving the series of mathematical equations using the initial 
condition values and the process parameters, to generate a value or a table of values as 
a function of time for one or more output functions of the biological network, thereby 
simulating the biological network. 

[0013] In preferred embodiments, the series of mathematical equations are a series of 
differential equations. Furthermore, the method can further include manually modifying the 
series of mathematical equations before solving the series of mathematical equations. 

[0014] In certain aspects of the methods of the present invention, the generating a series of 
mathematical equations comprises generating a hierarchical arrangement of canonical input 
forms and associated canonical output forms from the input representation and the process 
parameters, wherein a level of the hierarchical arrangement includes the series of 
mathematical equations. In preferred embodiments of these aspects, the canonical output 
forms can be modifiable by a user at each level of the hierarchical arrangement. Typically, 
the series of mathematical equations and the value for one or more outputs are generated 
automatically. In certain preferred embodiments, the biological network is a developmental 
network. The biological network can include a cell and its progeny, and the method can 
provide a representation an organism as a graph, wherein the graph comprises a list of nodes, 
a list of links, and a lineage tree. The nodes can include one or more models that include a 
system of differential equations and associated parameters that describe some aspect of the 
biological network. 

[0015] The representation of the biological network in the method can include a series of 
graphics of a graphic user interface or a cartoon description of the biological process. 
Furthermore, the method can include before generating a series of mathematical equations, 
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generating a series of chemical equation canonical input forms based on a more detailed 
representation of the biological network than the user representation of the biological 
network. 

[0016] The method can further include defining a target output value and recording the 
input conditions that achieve this target output value, wherein the changes are automatically 
generated and steps a-c are automatically repeated until the target output value is attained. 

[0017] In another aspect the present invention provides a computer system that includes 
the following: 

a user interface capable of receiving and displaying initial condition values, process 
parameters, and a user representation of a biological network, wherein the user representation 
is input as a series of canonical input forms, wherein the format for each canonical input form 
is based on a type of biological process in the biological network; 

an interpreter function capable of generating a hierarchical arrangement of canonical 
input forms and associated canonical output forms from the input representation and the 
process parameters, wherein a level of the hierarchical arrangement comprises a series of 
mathematical equations; and 

an equation solver function capable of receiving the process parameters, the initial 
condition values, and the series of differential equations, and solving the differential 
equations, thereby generating a value or a table of values as a function of time for one or 
more output functions 

[0018] The series of mathematical equations are typically differential equations. 

[0019] In certain preferred embodiments of the present invention, the computer system 
includes a graphing function capable of receiving the value or table of values from the solver 
function, and generating a graph displaying the numerical values, wherein the graph is 
displayed on the user interface. 
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[0020] In certain preferred embodiments, wherein the interpreter function generates 
multiple levels of canonical output forms, the user interface displays the canonical output 
forms at every level to allow a user to modify the canonical output forms. 

[0021] In another aspect, the present invention provides a method for generating revenue 
comprising providing access to a computer system of the invention in exchange for 
consideration. The consideration is typically a user fee and the computer system is accessible 
through a LAN or WAN such as the internet. 

[0022] In another aspect, the present invention provides a computer program product for 
simulating a biological network comprising a computer-usable medium having a computer- 
readable program code for effecting the following steps within a computing system: 

a) receiving initial condition values, process parameters, and a user 
representation of the biological network, wherein the user representation is input 
using one or more of a series of biological network canonical input forms, wherein the 
format for each canonical input form is based on a type of biological process in the 
biological network; 

b) generating a hierarchical arrangement of canonical input forms and 
associated canonical output forms from the input representation and the process 
parameters, wherein a level of the hierarchical arrangement comprises a series of 
initial differential equations; 

c) numerically solving the series of mathematical equations using the initial 
condition values and the process parameters, to generate a value or a table of values as 
a function of time for one or more output functions of the biological network, thereby 
simulating the biological network. 
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[0023] The computer program product can effect generating the hierarchical arrangement 
of canonical output forms such that the canonical output forms are modifiable by a user at 
each level of the hierarchical arrangement. The biological network simulated by the 
computer program product can be a developmental network. Furthermore, the computer 
program can further effect representing a developmental network of an organism as a graph, 
wherein the graph comprises a list of nodes representing cells, a list of links of the cells, and a 
lineage tree of the cells 

[0024] In another aspect, the present invention provides an automated method for 
simulating a developmental process of an organism. The method includes the following 
steps: 

a) receiving initial condition values and process parameters for the developmental 
process of the organism; 

b) representing the organism or a tissue within the organism by a graph data 
structure, wherein the graph data structure includes: 

i) a list of links, each link representing the interaction between two cells; 

ii) a lineage tree recording the family tree of cell birth for the cells represented 

by the list of links; and 

iii) a list of nodes, each node representing a cell of the cells represented by the 
list of links, with an embedding describing the location of the cell in Cartesian coordinates 
and a set of differential equations describing the time evolution of the location of the cell, the 
differential equations including the initial condition values and process parameters, each node 
having a model that includes a system of differential equations and associated parameters 
describing the developmental process; and 

c) repeatedly solving the set of differential equations in a series of steps for a 
defined number of steps, wherein after each step results are generated and compared to a 
threshold to determine whether the developmental process has reached a trigger point for 
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changing the number of nodes in the list of nodes, thereby simulating the developmental 
process. 

A method according to this aspect of the invention preferably further includes: 
d) graphing the nodes using the Cartesian coordinates. 

[0025] The developmental process can be, for example, cell division, wherein reaching the 
trigger point adds a new node to the list of nodes. 

[0026] The developmental process can be, for example, cell death, wherein reaching the 
trigger point removes node from the list of nodes. 

[0027] The developmental process described by the differential equations can be a cell 
cycle pathway checkpoint, or can further include, for example, a signal transduction network 
of the cell. Preferably, each node is numbered as it is added to the graph data structure. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0028] Figure 1 illustrates a preferred user interface of the present invention. A palette is 
provided for selecting canonical input forms representing various reaction types. 

[0029] Figure 2 illustrates the topology of MAPK signaling cascade. Each double arrow 
represents activation through dual phosphorylation. Two and three-member scaffolds have 
been identified experimentally and are depicted here. 

[0030] Figure 3 provides canonical form for glycolysis reactions and corresponding ODEs 
generated by Cellerator™, a computer system for performing preferred embodiments of the 
methods of the present invention. 

[0031] Figure 4 illustrates the initial conditions for a 215 cell shoot apical meristem 
(SAM) simulation. Node shades indicate cell type. 
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[0032] Figure 5 is a diagram of the structure of graph domains. 

[0033] Figure 6 graphically illustrates "spring" potential (see equation 65). 

[0034] Figure 7 provides a graph showing the effect of relaxing several assumptions made 
in the previous report. The time integral of free dually phosphorylated MAPK over first 100 
sec. is plotted vs. scaffold concentration. The "control" curve (triangles) reproduces the data 
with all the assumptions made previously, whereas the other curves represent the results of 
relaxation of these assumptions such that K4 can form a complex with K3 in a scaffold 
(squares); phosphatases act on kinases in a scaffold (diamonds), and the first and second 
phosphorylation rates are equal (X). All data were obtained using the Cellerator™ package 
and are plotted in Microsoft Excel. Triangles represent resulting values for a control 
simulation; squares represent values obtained for a simulation performed where K4 can form 
a complex with K3 in the scaffold; Circles represent resulting values for a simulation 
performed where phosphatases act on kinases in the scaffold; and X represents values 
obtained for a simulation performed where the first and second phosphorylation rates are 
equal. 

[0035] Figure 8 provides Cellerator™ arrows (i.e., an example of specific chemical 
formula depictions), generated ODEs, and results of numerical integration for the mitotic 
oscillator (equations (76) to (79)). The interpret functions returns the differential equations; 
the run function returns Mathematical interpolating functions and/or code in SBML, C, 
FORTRAN, MATHML, XML, or HTML. 

[0036] Figure 9 describes a graph structure used in certain preferred methods of the 
present invention, for a unicellular organism with a minimal developmental system. The 
indices on the variables indicate cell number. 

[0037] Figure 10 illustrates an organism-as-graph with 20 cells. 

[0038] Figure 1 1 illustrates a lineage (family tree) of the simulation of Figure 10. 
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[0039] Figure 12 provides an exemplary Cellulator™ command syntax. 

[0040] Figure 13 provides an exemplary Cellulator™ command syntax. 

[0041] Figure 14 provides the implementation of automatic generation of the MAP 
kinases activation reactions (through phosphorylation) in the scaffold in the Cellerator™ 
environment, a preferred environment for performing the methods of the present invention. 
All the possible scaffold states (species) are generated as are the transition reactions between 
them. The indexes in the parentheses indicate the phosphorylation status of the kinase in the 
corresponding position, with -1 corresponding to the absence of the kinase from the scaffold 
complex. K[4,l] represents the external kinase activating the first MAP kinase (MAPKKK) 
in the cascade. 

[0042] Figure 15 provides a flow chart representing an automated method for simulating a 
biological network according to the present invention and illustrates that although a computer 
program is used to generate and solve the series of initial mathematical equations, preferably 
a user can intervene and modify canonical output forms generated throughout the process. 

[0043] Figure 16 provides a diagrammatic representation of a computer system for 
performing a method for simulating a response of a biological network. 

[0044] Figures 17a-d provides an output containing an initial set of reactions for a MAPK 
cascade on a Scaffold generated by the Cellerator™ program after the input described in 
Example 1, wherein the list of reactions are assigned to the variable c. 

[0045] Figures 18a-j provides a set of differential equations generated by the Cellerator™ 
program after the input described in Example 1, wherein the list of differential equations are 
assigned to the variable s. 
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DETAILED DESCRIPTION OF THE INVENTION 

[0046] The present invention provides an automated method for simulating a biological 
network. The method includes the following steps: 

a) receiving initial condition values, process parameters, and a user 
representation of the biological network, wherein the user representation is input 
using one or more of a series of biological network canonical input forms, wherein 
each canonical input form is based on a type of biological process in the biological 
network; 

b) generating a series of mathematical equations in an equation output 
canonical form based on the input representation of the biological network and the 
process parameters; and 

c) numerically solving the series of mathematical equations using the initial 
condition values and the process parameters, to generate a value or a table of values as 
a function of time for one or more output functions of the biological network, thereby 
simulating the biological network. 

[0047] Biological networks and the detailed chemical processes that occur within them, 
are described by the methods and systems of the present invention utilizing general canonical 
forms. These canonical forms can be either input forms, such as chemical reactions, or output 
forms, such as differential equations that are automatically generated using the methods and 
systems of the present invention. Biological networks are generally expressed in terms of the 
biochemical cascades that occur. The chemical reactions of these biochemical cascades 
constitute the core input forms of the present invention; the corresponding differential 
equations constitute the core output forms. Differential equations can be thought of as output 
because they can be passed on to solver and/or optimizer modules to handle. 
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[0048] Typically, as shown in step 10 of FIG. 15, to initiate the methods of the present 
invention, a user inputs a user representation of the biological network which is received by a 
computer program performing the methods of the present invention. The representation of 
the biological network can include, for example, a series of graphics of a graphical user 
interface, reaction scheme diagrams, a block diagram, and/or a cartoon description of the 
biological process. For example, Ichikawa discusses a palette-based graphical user interface 
for describing small biochemical systems and translating them into differential equations that 
can be subsequently solved with other utilities (Ichikawa, K., Bioinformatics, 17:483-484 
(2001)). According to the methods of the present invention, the accuracy of the user 
representation can be confirmed by comparing results of the automated methods of the 
present invention with results from experiments performed on living systems. Therefore, the 
present invention provides a method to confirm the accuracy of models of biological 
networks. 

[0049] The user representation of the biological network is input using one or more of a 
series of biological network canonical input forms. The biological network canonical input 
forms may take on various forms as discussed above. However, each canonical input form is 
based on a type of biological process, typically a type of biological reaction, in a biological 
network being simulated. For example, different canonical input forms can be used to 
represent a reaction in which a complex is formed, a dissociation reaction, a conversion, a 
degradation reaction, transcription regulation, an enzyme kinetic reaction, Hill function, or 
non-hierarchical cooperative activation reactions. Mathematical equations are then 
automatically generated by a computer program performing a method of the present 
invention, based on the type of reaction received by the program, according to the rules 
described herein. Typically, the mathematical equations are ordinary differential equations 
based on the canonical input form. 

[0050] The series of biological network canonical input forms and the equation output 
canonical forms in preferred embodiments are part of a hierarchy of canonical input forms 
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and related canonical output forms that describe a biological network at various levels of 
detail. In fact, preferred methods of the invention are based on the concept of a hierarchy of 
canonical forms that describe biological processes at various levels of detail. These preferred 
methods are based on the fact that biological systems are usually described in terms of signal 
transduction networks (STN). Nodes in an STN typically represent chemical species (e.g., 
nucleic acids, proteins, etc.) while links represent interactions between the species. Such 
networks are inherently hierarchical. Nodes may represent anything ranging from single 
molecules (e.g., particular enzymes, receptors) or ubiquitous modules (e.g., MAPK cascades, 
transcription complexes, etc.) to extremely complex processes such as mitosis (see, for 
example, Kohn, K.W., Mol. Biol. Cell. 10:2703-2734 (1999)). At the highest level of 
abstraction an input canonical form is pictorial (e.g., a cartoon drawn on the screen using 
some sort of GUI), while an output canonical form is a complete set of differential equations 
describing the network. 

[0051] At each level of hierarchy the two classes of canonical forms can be identified: the 
input canonical form, that is used to supply information to the program, and the output 
canonical form (OCF) that is automatically generated by an interpreter function, also called a 
simulator. Because of this hierarchical arrangement, the step 20 of generating a series of 
mathematical equations in an equation output canonical form based on the input 
representation of the biological network and the process parameters, can include generating a 
hierarchical arrangement of canonical input forms and associated canonical output forms 
from the input representation and the process parameters. In these embodiments, a final level 
of the hierarchical arrangement comprises a series of initial mathematical equations. 
Therefore, the automated methods of the present invention can generate a series of 
hierarchical levels of canonical output forms that are used to automatically generate a next 
level of canonical output forms. 

[0052] The methods of the present invention are based on the proposition that there is a 
one-to-one relationship between each class of interaction (link) in an STN and a hypothesized 
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formal (i.e., mathematical) description of that interaction. The methods of the present 
invention can represent nodes with variables (e.g., chemical concentrations) and links with 
specialized chemical reaction depictions. The method can include before generating a series 
of initial mathematical equations, generating a series of chemical equation canonical input 
forms providing a more detailed representation of the biological network than the user 
representation of the biological network. Both of these steps are performed automatically 
typically by a computer program performing the methods of the present invention. 

[0053] Typically, the series of mathematical equations generated in step 20 of the methods 
of the present invention are a series of differential equations. However, other types of 
equations can be used either in place of differential equations or along with differential 
equations. For example, stochastic equations, stochastic differential equations, or differential 
equations and simple algebraic expressions representing some algebraic constraints 
representing, for example, random noise can be used. The random noise represented by these 
equations includes, for example, diffusion rates or thermal noise. The algebraic constraints 
can specify some algebraic equations that are also true (e.g., say x+y+z=0) as well as some 
test conditions (such as If (x+y>z) then (w=x) else (w=y). 

[0054] As described above, the concept of a canonical form is central to the methods and 
systems of the present invention. At each level of information processing, typically there are 
both input and output canonical forms. The output forms can them be fed back into the 
system as input forms for the next stage of processing. The implementation of these 
canonical forms is platform and language dependent. It is the canonical forms themselves 
that are important to the methods and systems of the present invention, not their 
implementation. One can thus visualize a succession of canonical forms (Table 1). The 
various levels indicated in Table 1 are for illustrated purposes only; the actual names and 
succession of levels is not a core element of the methods and systems. 
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Table 1. Canonical forms for different levels of the Cellerator™ paradigm. STN: Signal 
Transduction Network; DAE: Differential-Algebraic Equation; ODE: Ordinary Differential 
Equation. 



Level 


Input Form 


Output Form 


Web Interface 


Web Database Record 


Graph-Representation of STN 


User Interface 


Cartoon Figure 


Graph representation of STN 


Notebook Interface 


Cellerator™ Arrow 


Chemical Equations 


Translator 
Solver 

Simulation Kernel 


Chemical Equations 
DAEs, ODEs 
Numerical Solutions 
plus Graph 


DAEs, ODEs 
Numerical Solutions 
Modified Graph 



[0055] Virtually any type of biological network can be simulated using the methods of the 
present invention. For example, the biological network can include the following: simple 
chemical reactions: degradation, enzymatic reactions in solution, multi-molecular complexes 
with a non-trivial number of states (e.g., scaffold proteins), multiple interacting and non- 
overlapping pathways, transcription, translation, intracellular components, transport 
processes, morphogenesis, simple development systems, tissue development and 
development of complex multicellular organisms. These biological networks include 
metabolic reaction networks, catabolic reaction networks, nucleic acid synthesis reaction 
networks, amino acid synthesis networks, energy metabolism and so forth. Other types of 
biological reaction networks include cell signaling networks, cell cycle networks, genetic 
networks involved in regulation of gene expression, such as operon regulatory networks, and 
actin polymerization networks that generate portions of the cytoskeleton. Most of the major 
cell functions rely on a network of interactive biological reactions. 

[0056] A wide variety of biologically-based interactions can be represented as specialized 
chemical reaction depictions, a type of canonical input form used in preferred embodiments 
of the methods and systems of the present invention. The chemical reaction depiction S 
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S§ fi P, for example, can be the input canonical form for a catalytic reaction in which 
species E facilitates the conversion of S into P. Output canonical forms can take either of two 
forms, selected at the user's discretion: lists of biochemical reactions, or systems of ordinary 
differential equations (ODEs). The corresponding OCF for the catalytic reaction, for 
example, can be either a system of ODEs as determined by the law of mass action, or a list of 
chemical reactions such as s+e^se, etc. A slightly different specialized equation form 
a^> b can indicate that the ODEs are determined by steady state kinetics (e.g., Michaelis- 
Menten) rather than mass-action equations. A similar notation A h> b can mean that A 
facilitates the transcription of 5. 

[0057] Although a computer system 100 is used to generate the canonical output forms, 
and typically canonical input forms from these canonical output forms, the methods of the 
present invention, in certain preferred embodiments, allow explicit output description at each 
level so that a user 110 can modify the equations at any stage desired. Users can thus 
manually modify the ODEs or chemical equations, or add additional constraints in the form 
of differential, algebraic, or chemical equations. For example, the series of initial 
mathematical equations, in certain preferred embodiments, can be modified before they are 
solved. As another example, where the method generates a series of detailed chemical 
equations as a canonical output form based on a the user representation of the biological 
network, the method can be implemented so that a user can manually modify the series of 
detailed chemical equations. 

[0058] Output functions of the methods of the present invention include virtually any 
output function of biology. Output functions include, but are not limited to, growth rate, 
cellular relationships and interactions during development, changes in protein 
phosphorylation patterns over time, changes in biomass over time, and yields of biomolecules 
such as proteins, carbohydrates, antibiotics, vitamins, amino acids, fermentation products, 
such as lactate production, yields of chiral compounds and other low molecular weight 

GT\6265491.1 
104662-60 



ATTORNEYDOCKET NO: 
CIT1 440-1 

18 

compounds, and the maximal internal yields of key co-factors, such as energy carrying ATP 
or redox carrying NADPH and NADH. 

[0059] The user representation of the biological network may be an informal, broadly 
focused, cartoon-based representation of the biological network. FIG. 2 is an example of a 
cartoon-based representation that can be used to represent a biological network. Certain 
shapes such as ovals can be used to represent proteins in a biological network, activation such 
as phosphorylation can be indicated by arrows, with double, triple, etc. phosphorylation 
indicated by double, triple, etc. arrows, respectively, and multimember complexes, such as 
scaffolds, can be represented by lines or figures that connect or horizontally or vertically 
overlap proteins of the multimember complex (as shown in FIG. 2). A computer program 
performing a method of the present invention can utilize this cartoon input form to 
automatically derive a series of canonical forms that represent the chemical reactions 
represented in the cartoon, and/or can automatically generate a series of initial mathematical 
equations from the cartoon. 

[0060] In certain embodiments of the present invention, rather than describing specific 
details of the biological network, the user can input the user representation of the biological 
network by selecting the biological network from a listing of biological networks appearing 
on a computer display provided by a computer system capable of performing the methods of 
the present invention. A detailed representation of the biological network selected by the 
user may be obtained by the computer system by accessing a database containing such 
information. Additionally, a computer system performing the methods of the present 
invention may include a listing, such as a listing stored in a database, of other biological 
networks, and may search the listing to identify other biological networks that interact with 
the biological network selected by the user. The computer system may utilize information 
regarding the other biological networks in generating a value for one or more output 
functions of the biological network using the methods of the present invention. The listing of 
biological networks may be provided by a provider of a computer program capable of 
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executing the methods of the present invention. Furthermore, the methods of the present 
invention can include a step whereby user representations of biological networks are stored in 
a listing of biological networks which is searched for interactions with an on-test biological 
network as described above. 

[0061] Initial condition values received in step 10 include any value that is important to 
determine product formation of a biological reaction. Such values include, for example, 
concentrations or quantities of reactants, or enzymes, pH, temperature, cellular geometry, etc. 
For example, where the biological network is cell divisions of a single cell system, initial 
condition values are specified, for example, for cell geometry and initial concentrations of 
reactants such as proteins involved in mitotic regulation (e.g. cyclins, CDC2 kinase, cyclin 
protease, etc.). 

[0062] Process parameters received in step 10 include rate constants, connection strengths, 
thresholds for activation, cooperativity, spatial geometry including cell position, etc. 

[0063] The methods of the present invention can be extended from a simulation tool to a 
learning tool in which target output values (i.e. desired outputs) can be specified to obtain the 
parameters and therefore the model which leads to the target output values (Rienitz, J., et al. 
"A Connectionist Model of the Drosophila Blastoderm," in The Principles of Organization in 
Organisms, (eds. Jay E. Mittenthal and Arthur B. Baskin, Santa Fe Institute Studies in the 
Sciences of Complexity, Addison-Wesley) (1992); and Reinitz J., et al., Journal of 
Experimental Zoology 271 :47-56, (1995)). Thus, the method can further include defining a 
target value for an output function and recording the input conditions that achieve this target 
value for the output function, wherein changes in input conditions are automatically 
generated and steps a-c are automatically repeated in an iterative manner until the target 
output value is attained. 

[0064] The method of the present invention can be used to simulate dynamic biological 
networks (i.e. biological networks which change over time). For these simulations, steps a-c 
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are typically repeated after changing one or both the initial condition values, or the 
representation of the biological network, based on the results of the previous cycle. The 
cycling is typically carried out for a user defined number of cycles based on a relevant time 
period. 

[0065] Furthermore, the present invention can be used to predict the effects of alterations 
of the biological network on one or more outputs of the biological network. Thus, the present 
invention can be used to predict the effect of small organic compounds or other on-test 
compounds on the biological network to assist in biopharmaceutical product development. 

[0066] The method for simulating a response of a biological network (FIG. 15) is carried 
out using a computer system 100, as schematically represented in FIG. 16, which itself 
represents another aspect of the present invention. The computer system typically includes a 
user interface capable of receiving and displaying initial condition values, process 
parameters, and a user representation of a biological network, from a user. The user 
representation is input as a series of canonical input forms. The computer system 110 
includes an interpreter function 130 for performing step 20. That is, the interpreter function 
130 is capable of receiving the canonical input forms and the process parameters, and 
generating at least a first level of output canonical forms, and typically multiple levels of 
output canonical forms, based on the input canonical forms. The output canonical forms 
include a series of mathematical equations, typically as a final level of canonical output 
forms, generated from the user representation of the biological network and the process 
parameters. In certain preferred embodiments, the user interface displays the canonical 
output forms at every level, and allows a user 110 to modify these canonical output forms. 

[0067] The computer system 100 includes an equation solver function 140 capable of 
receiving the series of initial mathematical equations, the initial conditions values, and the 
process parameters, and generating a numerical value, typically a table of values as a function 
of time, for one or more output functions using the mathematical equation and the initial 
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condition values (Step 30). This step 30 typically includes mathematically solving the 
equations before or while, they are numerically solved. In certain preferred embodiments of 
the present invention, the computer system 100 includes a graphing function, capable of 
receiving numerical values from the numerical solver function, and generating a graph from 
them which is displayed on the user interface. The computer system may also include an 
optimizer function which fits data to the equations, preferably differential equations, thereby 
determining optimal or improved values for some or all of the process parameters 

[0068] The computer systems 100 used to perform the methods of the present invention 
can implement a variable structure system using a commercially available pre-packaged fixed 
structure differential equation solver capability such as Mathematica™'s NDSolve. For 
example, the computer program can have a Mathematica® notebook interface and/or a 
graphical user interface. One of ordinary skill in the art will recognize that the translator 
function can be used with many different user interfaces (i.e. "front ends") as well 
Furthermore, the computer system can include a graphing capability that generates graphs of 
the numeric values for output functions, which can be displayed on the user interface. 

[0069] Typically, the user interface allows the user to input the user representation of the 
biological network manually, or to select the canonical input forms used in the user 
representation from a specialized computer input interface. The specialized computer input 
interface may be a GUI interface. For example, the user can input the canonical input forms 
that represent a biological network using a specialized palette, for example that depicted in 
FIG. L The palette can include a reaction list section 10 that includes input canonical forms 
that can be selected by a user to construct the specific input canonical forms that represent a 
biological network. The palette can be designed using commercial programs such as 
Mathematica® (Wolfram Research, Inc., Champaign, IL), etc. Palettes can also be written 
with any standard computer language such as, for example, C, C++, or Java, or with other 
programs such as MatLab (The MathWorks, Inc., Natick, MA), or Lab View (National 
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Instruments, Austin, TX). Furthermore, the palette can be designed such that it can be 
rearranged by the user. 

[0070] One or more of the functions of the computer system of the present invention can 
be developed using a any of a number of currently-available tools in addition to 
Mathematica® {e.g., BioSpice, DBSolve, E-Cell, Genesis, Gepasi, M-Cell, Neuron, Stochsim, 
V-Cell, XSIM; for references see Hucka M., et al, 2 March 2001, 

http://www.cds.Caltech.edu/erato. Preferably, the tool that is used to implement the methods 
and computer systems and programs of the present invention is a computer algebra program 
(e.g., Mathematica®). A computer algebra program tool facilitates weight-sharing and 
indexing, and many other "semantically deep transformations" that can be performed by the 
methods of the present invention during the canonical form to canonical form translations. In 
a preferred embodiment, the methods and systems of the present invention generate output in 
a standardized data transfer protocol such as the systems biology markup language (SBML) 
proposed by Hucka et al ("The ERATO Systems Biology Workbench: An Integrated 
Environment for Multiscale and Meltitheoretic Simulations in Systems Biology," in 
Foundation of Systems Biology (ed. Hiroaki Kitano) 125-44, MIT Press (2001)). 
Furthermore, Python script can be used to convert SBML into compilable C code. 

[0071] Output canonical forms can be generated by the interpreter function in a variety of 
formats: for example, as specialized chemical reaction depictions, Mathematica® differential 
equations, in C, FORTRAN, SBML (Hucka M. ? et al. (2001)), MATHML, or HTML. If 
desired, the user can also solve the equations numerically (e.g., using Mathematica® 's 
NDSolve Wolfram, S., The Mathematica® Book. Fourth Edition. Cambridge University 
Press, New York (1999)). 

[0072] The computer system used to perform the methods of the present invention can 
include a database of information regarding one or more biological networks and biological 
processes within the biological networks. This information can include identification of 
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biomolecular reactants, products, cofactors, enzymes, rates of reactions, coenzymes, etc. 
involved in a biochemical network. This information can include the stoichiometric 
coefficients that indicate the number of molecules of a compound that participates in a 
particular biochemical reaction. This information can include details of multimeric reactions 
and interacting and overlapping pathways, as well as information regarding locations of 
reactants, (i.e. if they are in a membrane, in the cytoplasm, or inside an organelle such as the 
mitochondria). The information can also include experimentally derived or calculated rates 
of reactions under various conditions. Furthermore, the database can include any type of 
biological sequence information that pertains to a biological network. Finally, the data in the 
database is preferably stored in a standardized form that is or can be rapidly converted to, 
canonical input forms of the present invention. 

[0073] The database can be a flat file database or a relational database. The database can 
be an internal database, or an external database that is accessible to users, for example a 
public biological sequence database, such as Genbank or Genpept. An internal database is a 
database maintained as a private database, typically maintained behind a firewall, by an 
enterprise. An external database is typically located outside a user ! s local computer network, 
and is typically maintained by a different entity than that which maintains a user's computer 
and local network. Many external public biological information databases are available and 
can be used with the present invention. For example, Kyoto Encyclopedia of Genes and 
Genomes (KEGG)(Available at http://www.genome.ad.jp/KEGG/), EcoCyc and Metacyc 
(Available at http://malibu.ai.sri.com/), EMG (Available at http://emp.mcs.anl.gov/) , 
BRENDA (Available at http://www.brenda.uni-koeln.de/) . etc., as well as many of the 
biological sequence databases available from the National Center for Biological Information 
(NCBI), such as Genbank. These databases can be used in the methods of the present 
invention. 

[0074] The function of a computer system of the present invention typically includes a 
processing unit that executes a computer program product, itself representing another aspect 
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of the invention, that includes a computer-readable program code embodied on a computer- 
usable medium and present in a memory function connected to the processing unit. The 
memory function can be ROM or RAM, for example. 

[0075] The computer system used to perform the automated methods of the present 
invention can be a stand-alone computer or a conventional network system including a 
client/server environment, and optionally one or more database servers. A number of 
conventional network systems, including a local area network (LAN) or a wide area network 
(WAN), are known in the art. Additionally, client/server environments, database servers, and 
networks are well documented in the technical, trade, and patent literature. For example, the 
server can run on an operating system such as UNIX, running a World Wide Web 
application, and a World Wide Web Server. 

[0076] The computer program product that is read and executed by the processing unit of 
the computer system of the present invention, includes a computer-readable program code 
embodied on a computer-usable medium. The program code is capable of effecting the 
following steps within a computing system: 

a) receiving initial condition values, process parameters, and a user 
representation of the biological network, wherein the user representation is input 
using one or more of a series of biological network canonical input forms, wherein the 
format for each canonical input form is based on a type of biological process in the 
biological network; 

b) generating a hierarchical arrangement of canonical input forms and 
associated canonical output forms from the input representation and the process 
parameters, wherein a level of the hierarchical arrangement comprises a series of 
initial differential equations; and 

c) numerically solving the series of mathematical equations to generate a value 
or a table of values as a function of time for one or more output functions of the 

GT\626549U 
104662-60 



ATTORNEYDOCKET NO: 
CIT1440-1 



25 

biological network by inputting the initial condition values and the process parameters 
into the solved mathematical equations, thereby simulating the biological network. 

[0077] The computer program product can effect generating the hierarchical arrangement 
of canonical output forms such that the canonical output forms are modifiable by a user at 
each level of the hierarchical arrangement. The biological network simulated by the 
computer program product can be a developmental network. Furthermore, the computer 
program can further effect representing a developmental network of an organism as a graph, 
provides a representation an organism as a graph, wherein the graph comprises a list of nodes 
representing cells, a list of links of the cells, and a lineage tree of the cells. 

[0078] In another aspect, the present invention provides a method for generating revenue 
comprising providing access to the computer system of the present invention in exchange for 
consideration. The consideration is typically a user fee, for example a per-use fee or a 
periodical fee for access to the computer system. For example, the computer system can be 
accessed by a user via a LAN or a WAN, such as the Internet. 

[0079] The following section identifies core biochemical reactions that describes various 
biological processes, the representation of these core biochemical reactions as specialized 
chemical reaction depictions, a type of canonical input form, and the interpretation of these 
core chemical reactions as differential equations. A fundamental library of simple chemical 
reactions can be quickly developed; such reactions take the form 

X,eS'cS Y,eS*cS 

where S is a set of reactants and S' and S" are (possible empty and possibly non- 
distinct) subsets of S and k is a representation of the rate at which the reaction proceeds. In 
general there are rarely more than two elements in either S* or S" but it is possible for there to 
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be more. For example, all of the following chemical reactions fall into this form: 
A + B -> C = AB complex formation 

C = AB^A + B dissociation 

A — > B conversion (2) 

A — > ^ degradation 
$ —> A creation (e.g., through transcription) 

[0080] Enzyme kinetic reactions, which are usually written as 

S + E^>P + E (3) 

where E is an enzyme that facilitates the conversion of the substrate S into the product 
P y would also fall into this class. More generally, equation 3 is a simplification of the 
cascade 

S + E<r>SE^>S+P (4) 

where the bi-directional arrow indicates that the first reaction is reversible. Thus (4) 
is equivalent to the triplet of reactions 

{S+E-+SE, SE->S+E,SE-±S+P] (5) 

[0081] The reactions (4) or (5) can be written compactly with the following double-arrow 
notation 

E 

S=>P (6) 

which should be read as M the conversion of S to P is catalyzed by an enzyme E" If 
there is also an second enzyme, G, that can catalyze the reverse reaction 

G 

P=>S={P+G-+GP, GP^>G + P,GP->G + S) (7) 
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we further use the double-double arrow notation 



E 
S&P 
G 



(8) 



to compactly indicate the pair of enzymatic reactions given by (6) and (7). The 
enzyme above the arrow always facilitates the forward reaction, and the enzyme beneath the 
reaction always facilitates the reverse reaction. For example, E might be a kinase and G 
might be a phosphatase molecule. Since each of equations (6) and (7) represent a triplet of 
simpler reactions, we observe that the notation of equation (8) compactly represents a total of 
six elementary reactions, each of which is in the form given by equation (1). We therefore 
take equation (1) as our input canonical form for chemical reactions. 

[0082] The corresponding output canonical form is given by the set of differential 
equations 



where the and c ia are constants that are related to the rate constants , the signs of the 
c ia are determined from which side of equation (1) the terms in equation (9) correspond to, 
and the n iaj represent the cooperativity of the reaction. The summation is taken over all 
equations in which X% appears. Multi-molecular reactions (e.g., binding to a scaffold protein) 
and multiple interacting and overlapping pathways are described in much the same way - 
there are just more reactions that must be included in our model. The canonical forms (1) and 
(9) can still describe each one of these reactions. 

[0083] Genetic transcription and translation into proteins can be described by an extension 
of equation (9) to include terms of the form 



(9) 



« j 




(10) 
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where the product runs over the various transcription factors {Xp} that influence 

production of Xf. If there are any reactions of the form (1) for Xi then the expression on the 
right side of equation (10) would be added to the right hand side of (9). In a more realistic 
system, a gene would b e influenced by a (possibly large) set of promoter and enhancer 
elements X t that bind to different sites. A hierarchical model could describe this set of 
interactions 



1 + Ju, 



u 



aei JaVa 



KJi. 



1 + K a"a 



V JM. 



*«=n!^ (H) 



where i and j index transcription factors, a indexes promoter modules, b indexes 
binding sites, the function j(b) determines which transcription factor j binds at site j 9 the J and 
K are constants, and X is a degradation rate. 

[0084] Sub-cellular components represent a higher order of biological complexity. If we 
assume perfect mixing each component can be treated as a separate pool of reactants which 
we can describe by the reaction 

X A ^X B (15) 

[0085] This is taken to mean that X in pool A is transported into pool B at some rate. When 
the concentration changes and distances involved are small such processes can be described 
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by the canonical forms in equation (1). In large or elongated cells with long processes (such 
as neurons) or when the molecules have a net charge the transport process defined in equation 
(15) can not be described by the output canonical form (9). Instead we must modify this 
ordinary differential equation into a partial differential equation to allow for diffusion, 

Ti lf = v * & VXi + QAVF) + IqJI ( 16 ) 

# j 

where the D t are (possibly spatially dependent) diffusion constants for species Q 
are charge and temperature dependent constants, and Fis the voltage. Other voltage and 
pressure dependent movement between compartments (especially those with membranes) that 
are controlled by channels and transport proteins could be described by including additional 
terms on the right hand side of equation (16) (e.g., Hodgkin-Huxley type expressions). 

[0086] A reiteration and slightly modified example of the preferred input canonical forms 
and corresponding equations described above, is provided in the following paragraphs. 
Uncatalyzed mass action reaction can be represented in the methods of the present invention 
by the following chemical reaction notation: 

A + B n ->C (17) 

where B is optional and n is an optional positive integer indicating that one molecule 
of A combines with n molecules of B to form one molecule of C. Either A or C may be the 
empty set (0). The term B n can also be written as nB . Two-way reactions can be written 
with the "double arrow" as 

A + B n ^C (18) 
[0087] The general syntax for generating ODEs from reactions is 

interpret [r] (19) 
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where r is a list of reactions of the form 

r = {{reaction,rates}, fi(X\ 
{reaction, rates},..,} 

[0088] Several examples of the translation are illustrated in table 2. 

Table 2. Specialized chemical reaction depictions and differential equation 
interpretations for uncatalyzed reactions. 



Reaction Syntax 


ODE Interpretation 


{S->P,k} 


S' = -F = -kS 


{A+B-»C,k} 


A'= B' = -C =-kAB 


{A+B n -^C,k} 


A' = B' = -C'=-kAB? 


{A^B,kf ,kr} 


A' = -B'=-kfA + k r B 


{A+BFC,kf,kr} 


A' = B' - -C = ~kfAB+k r C 


{0->A,k} 


A' = k 


{B-»0,k} 


B' = -kB 



[0089] A catalytic reaction is any reaction in which one molecule E (for "enzyme") 
catalyzes the conversion S (for "source") into P (for "product"). In the most generalized 
form, an intermediate species (called SE) is formed, 

S + E^SeXp + E (21) 

d 

where a, d, and k are rate constants. The conversion of ATP into ADP by PFKA in 
equation (4) is an example of such a reaction. The canonical form for a catalytic reaction is 

{sip,a,d,k} (22) 
which is translated into the canonical forms 
{ {S + E-»SE,a},{SE-»S + E,d} ,{SE-»P + E,k}} (23) 
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and interpreted as already described. If a second enzyme F catalyzes the reverse 
reaction 

P + F^PfXs + F (24) 

we can either add a second reaction 

{pis,al,dl,kl} (25) 
or we can use the different notation 

E 

{S^P,a,d,k,al,dl,kl} (26) 

F 

to indicate all six reactions described by equations (22) and (25). If no intermediate 
compound is formed, we can write 

{S^P,k} (27) 

[0090] A summary of catalytic reactions is given in table 3, along with specialized 
chemical reaction depictions that represent these reactions and differential equations 
interpretations of these reaction depictions. 
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Table 3. Specialized chemical reaction depictions for catalyzed reactions and 
differential equation interpretations of these depictions. 

Reaction ODE interpretation 



Syntax 





S' = -a-E-S + d-S 


{S^P,a,d,k} 


P' = k-(SE) 




E' = -a-E-S + (d + k)-(SE) = -(SE)' 


E 


S'=k .(PF)-a'E.S+d-(SE) 


{S^P,a,d,k 

F 


P^-ayF-P*^ iPF)+k-(SE) 


al,dl,kl 






F>=-a v F'P+(d l +k l )iPF) = -(PF)> 


E 

{S -» P,k} 


S r = -k-E-S = -P' 


E 

{Sl-» P} 


n , (k + vE)S n 




K n + S n 



[0091] Several forms of transcriptional regulation can be represented by the left-bar 
arrow. Although this operator can also used for catalytic Hill functions, such operator 
overloading presents no confusion because the overscript is never used in the transcriptional 
form. The input canonical forms for transcription can preferably be: 

{AHB, f [options]} (28) 

where f indicates the format of the regulatory function, and options is a list of rules 
that define system parameters (constants). Regulatory functions currently available are: 
hill (Hill functions); GRN (for Genetic Regulatory Network) neural-network dynamics; and 
NHCA (for Non-Hierarchical Cooperative Activation (Mjolsness, E., "Trainable gene 
regulation networks with applications to Drosophila pattern formation," In: Computational 
Models of Genetic and Biochemical Networks, ed. Bower,J.M., Bolouri,H. MIT Press 
(2000)) a form of pseudo-Monod-Wyman-Changeux dynamics ( Shapiro, B.E, et al, In: 
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Foundations of Systems Biology, ed. H. Kitano. MIT Press, Cambridge, MA (2001), 
incorporated by reference herein in its entirety. 

[0092] A Hill function reaction of the form 



(29) 



{ Ah* B, hill [vmax-> v,nhill-»n, 

khalf^K,basalRate^ {£o,ri} } 

can be interpreted as the differential equation 



B , = ro+ n+HdL (30) 

0 K n + A n 



[0093] The concentration of species A is not affected by the reaction. If a set of p 
reactions of the form 

{{AiH»B,hill[..-] },...,{A p H>B,hill[-] }} (31) 

where the hill options have been suppressed for clarity, the differential equation 
becomes 

(n+Z^ (32) 

[0094] The set of parameters v, are the connection strengths for the corresponding neural 
network. 



[0095] A reaction involved in a genetic regulatory network of the form: 



{Ah-* B,GRN [RGRN— > R,TGRN— » T, 

nGRN -> n ,hGRN h } ^ ' 
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is interpreted as the differential equation 



B>= (34) 

\+<T TA *+ k 



[0096] Here 7 is the connection strength and h is the activation threshold. The 
concentration of species A is not affected by the reaction. If we have a set of p reactions of 
the form 

{{Ai B , GRN [-3 } ,.»,{Ap H> B,GRN [»•] } } (35) 

the differential equation becomes 

^-/(Uexp^^+^J 1 (36) 

where the T t and the are the connection strengths and thresholds for activation of B 

by Ai. 

[0097] A non-hierarchical cooperative activation type of regulation provides a non- 
hierarchical reduction of the HCA (hierarchical cooperative activation, (Mjolsness, E. (2000)) 
algorithm that has been previously proposed. A reaction of the form 



{ A I— > B,NHCA[TPLUS -> T + ,TMINUS -> T~ , 

nNHCA -> n,kNHCA -» k, (37) 
mNHCA— >M] } 

is interpreted as the differential equation 

(i+rVT (3 8) 

jt(i+7 T "^) w + (i+r + ^) m 
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where the t + ,t~ are the connection strengths for activation and inhibition. The plus 
(+) and minus (-) superscripts can be replaced by specifying variable names such as TP or 
TM (or values) instead. Superscripts are used here to clarify the notation. Elsewhere in this 
paper the same options exist for the (*) superscript and the various subscripts used on 
variables. The subscripts could actually be indicated with a square bracket notation, e.g. A 3 [t] 
would be written as A[3][t]. 

[0098] Alternatively, the option tnhca-* t supercedes the TPLUS and TMINUS options, 



{\+mx)A n ) n 



(39) 



k(l -I&(rT)A n ) m +(1+ TQ(T)A n ) m 

where e(x) is the Heaviside step function (see equation (109), below). With a set of p 
reactions of the form 

{{Aih»B,NHCA[.»] },... / {A p H»B,NHCA[-"] }} (40) 

the differential equations (38) and (39) become 

nLfl+tftfr (41) 
*nf_i0 r +nf =1 o +^ r 

and 

B , nf =1 c^^H"'T (42) 

*nf«i o -m-wAp r + nf =1 d+ 

respectively. If competitive binding is allowed the syntax 

{<A 1 ,A 2 ,...,A p >l-»B, 

NCHA[TPLUS->{Ti,T2 ,...},...]} 

can be used, where the NHCA options are specified as lists but are otherwise identical 
to (28). The corresponding differential equations are 



(43) 



and 



(44) 



B'= ' v " 1 ' (45) 
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respectively. Because the methods of the present invention typically automatically 
generate all of the terms in all of the necessary differential equations no additional constraints 
are needed to force mass conservation. However, this will sometimes lead to the generation 
of redundant reactions, which the user may want to suppress. For example, suppose that A 
can exist in either of two forms, A and^4* 5 where the total mass of a+a* = a t is a constant, but 
that only the activated form A* can be converted into 5. Biochemically we could input the 
reactions 

A^A*, A*^>B, A+A*=A T (46) 

as: 

interpret [ { { A -> AS , kl } ,{ AS -» B,k2 } } ] (47) 

[0099] The interpret function would return the following differential equations: 

{A'[t]=-klA[t], 

AS'[t] = kl A[t] - k2 AS[t], (48) 
B'[t] = k2 AS[t]} 

[0100] If the specific concentration of AS is not of interest, for example if it is not used 
elsewhere in the reaction schema, then we can use the "complement" notation 

interpret [ { { Comp [ A, AT ] -» B , k2 } } ] (49) 

which produces the single differential equation 

B'[t]= k2 (AT- A[t]) (50) 

[0101] The complement notation can be used for any reaction; the default total 
concentration is 1. 

[0102] The following section describes how hierarchical canonical forms can be utilized 
by preferred embodiments of the methods of the present invention to transform biochemical 
representations of biological networks to mathematical representations. In standard 
biochemical notation, protein cascades are represented by a arrow-sequence of the form 
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4=> £=>••• 



(51) 



where each step (the A 9 B, ...) would represent, for example, the activation of a 
particular molecular species. The methods and systems of the present invention translate the 
cascade (17) into a computable form while retaining the biological notation in the user 
interface. Mathematically, we can specify such as cascade as a multiset 



where P is a set of proteins, R is a set of reactions, IC is a set of initial conditions, / is 
a set of input functions, and F is a set of output functions. 

[0103] To illustrate this transformation process (from the biochemical notation, such as in 
equation (51), to the mathematical notation, as in equation (52)), we consider the example 
where equation (51) represents a simple linear phosphorylation cascade. In this case equation 
(5 1) would mean that A facilitates the phosphorylation of B, which in turn facilitates the 
phosphorylation of C, and so forth. In general, a cascade can have any length, so we define 
the elements of a cascade with a simple indexed notation, e.g., 



where K is used to indicate that all the members of the cascade induce 
phosphorylation of their substrates, that is they are kinases. In general, activation can 
proceed by any specified means. 

[0104] This indexed notation is always used internally by the program. The user, 
however, has the option of using either common names or the indexed variables. There is 
still a great deal of information hidden in this expression, such as how many phosphate 
groups must be added to make each successive protein active. In the MAPK cascade for 
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(52) 




(53) 
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example (as explained below), the input signal that starts this cascade is K 4 . The output, 
however, is not K\ 9 as this notation would suggest, but a doubly phosphorylated version of 
K\. Hence for MAPK cascade we introduce a modified notation; 



K 4 

K 2 ^K* 2 % * (54) 
K X =>K*=> * 



where each phosphate group that has been added is indicated with an asterisk. From 
this notation it is clear that the input is K4 and the output is K\**. 

[0105] In general, suppose we have a cascade formed by n proteins K\ 9 K2, K n , and that 
the I th protein K( can be phosphorylated a t times. Denote by £/ the fact that kinase K t has be 

phosphorylated j (possibly zero) times. The set P of all kinases k{ in an ^-component cascade 
is then 

^ = {^1 1 = 1,2,-^,7 = 0,1,..., (55) 
[0106] The reactions in the cascade are of the form 

K/ ^K/ +l \i = l,...,n-lJ = 0,...,a J (56) 

[0107] We note at this point that this notation describes a linear cascade, in which each 
element Ki is only phosphorylated by the active form of K i+ \. It does not include other 
reactions, when, for example, K3 might, under special circumstances, phosphorylate K\ 
directly without the intermediate step of first phosphorylating K 2 . Such additional reactions 
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could be added, but they have been omitted from this presentation to simplify the discussion. 
We can also add the dephosphorylation enzymes, or phosphatases, with a double-arrow 
notation: 



R = 



Kj £ K{+ X \i=\ ) ...,n-\J = ^...,a j -l\ 
1 Phi 1 1 



(57) 



[0108] In general, it is not necessary to specify explicit conservation laws with this 
notation because they are built directly into the equations. For example, we do not have to 
separately specify that the quantities 



K Tota! = ^ K j 



(58) 



y=0 



because this is implicit in the differential equations that are built using this notations. 
We do, however, have to specify the initial conditions, 

/c= ^:/(0)|z = i,2 5 ...,n ? y=o > i,... s ^} (59) 

[0109] Next, we need to specify how the cascade is initiated. For example if K4 is not 
present until some time t on and then is fixed at a level c, would write the set of input 
functions as 

I={K 4 (t) = cH(t-t on )} (60) 
where H(t)K), t< 0 and H(t)=l, t>0 

is the Heaviside step function. In some cases, we are only interested in the total 
quantity of each substance produced as a function of time, e.g., K/(t) . More generally, we 
would also specify a set of output functions F. For example we might have F = {f, g} where 
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flT) is the total accumulated protein concentration after some time T, 



f(T)=f K?\t)dt 



on 



(61) 



and g(c) is the steady state concentration of activated kinase, 



g(c)= lim lim 



T (0 



(62) 



where c is the input signal specified /. Then the cascade is then completely specified 
by the multiset C = {/>, ICJ,F} . 

[0110] If we have an additional regulatory protein, such as a scaffold that holds the 
various proteins in equation (54) together there are additional reactions. These describe 
binding of the enzymes to the scaffold and phosphorylation within the scaffold. We describe 
the scaffold itself by defining an object S pi ^.. iPn where n is as before (the number of 
kinases that may bind to the scaffold, or alternatively, the number of "slots" in the scaffold) 
and pi e{£fi,\,...,ai} indicates the state of phosphorylation of the proteins in each slot. Thus if 
Pi = s (or, alternatively, -1) the slot for K t is empty, if p t = 0, A? is in the slot, etc. For a 
three-slot scaffold, for example, we would add to the set P the following set 



[0111] To describe binding to the scaffold, we would also add to the set R the following 
reactions 



F= ^i jk |i=fi , ,0,l,...,fli,j = fi , ,0,l,...,a2 9 k = £,Q,\,.. .,03] 



(63) 




(64) 
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where the indices run over all values in the range 

Je^X^a^j (65) 
yi [0,1,. ..,a;, i=j 

[0112] For the three-member scaffold this would be 

U ^idt + K i = e,0,...,<nJ = 0,...,a 2 , * = ^0,...,a 3 ] (66) 

[0113] Finally, we have phosphorylation in the scaffold. This can be done either by a 
protein that is not bound to the scaffold, e.g., for the input signal, 

R " = $mr ,Pi-l=j<a i - l fi="h"->Pn + K S Ph" >Pi-^j+\,Pi= a i>-''>Pn j ( 67 ) 

where the two-sided double arrow (o) is used as shorthand for the (possibly bi- 
directional) enzymatic reaction, or by one that is bound to the scaffold, 

Rm=t $Pir-,Pi-l=j<ai-iJ>i=<*i>" m >Pn ~* S Pl>-~>Pi-Uj +\Pi= a h" '>Pn } 

or by some combination of the two, all of which must be added to the reaction list R. 
For the three-slot scaffold with external signal K 4 that activates K 3 , we have 

= {W -> S+Uifri = °'"" a i " ^ = ff '°- ^ } (69) 



and 



*4 
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[0114] Typical a\ values for this type of cascade are ax-a^l and #3=1. 

[0115] As an example, let us continue with the above-mentioned three-member cascade 
that is initiated with £4, In what follows, and throughout this specification, we refer to 
Cellerator™, a Mathematica® package that implements the above algorithms. The source 
code of Cellerator™ is found on the compact disk filed with this specification. One of 
ordinary skill in the art will recognize that other programs that implement the above 
algorithms, or other embodiments of the methods of the present invention, can be developed 
using the general teachings of this specification and program development tools known in the 
art, especially programming tools targeted to mathematical models, especially models 
involving differential equations, as described above. 

[0116] In Cellerator™ we have defined the function 

genR^cts[kinase-name y «, {a,-}, phosphatase-name], 

where Mnase-name and phosphatase-name are the names we want to give to the 
sequences of kinases and phosphatases, respectively, and n and a x are as before. The 
command illustrated in FIG. 12 is provided by Cellerator™ for performing the methods of the 
present invention to generate the above set of reactions (54). 

[0117] The input is in the first line while the output is the second line. Alternatively, the 
user could specify the set of reactions explicitly, or copy the output to a later cell to manually 
add additional reactions. If RAF has been set up as an alias for K$ then the rate constants are 
specified by a content-addressable syntax, e.g., as illustrated in FIG. 13, corresponding to 

a \ h 

RAF+ RAFK RAF - RAFK => RAF* + RAFK (71) 
<h 
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and 

* a 2 * h 

RAF + RAFP&RAF - RAFP=> RAF + RAFP (72) 

and so forth, where the numbers over the arrows indicate the rate constants (and not 
enzymes, as with the double arrow notation). Cellerator™ first translates the five high-order 
reactions (equation 54) into the corresponding set of 30 low-level reactions. Each low-level 
reaction (such as intermediate compound formation) is determined by applying the 
appropriate enzyme-kinetics description, and has a unique rate constant. The low-level 
reactions are subsequently translated into the appropriate set of 21 differential equations for 
the eight kinases, three phosphatases and ten intermediate compounds. When scaffold 
proteins are included (discussed below) these numbers increase to 139 high level reactions, 
348 low-level reactions (300 without kinases), and 101 differential equations (85 without 
kinases). 

10118] As another illustration of an implementation of the use of canonical forms to 
represent biological networks according to preferred embodiments of the methods of the 
present invention, consider the glycolytic step in which an activated form of the enzyme 
phosphofhictokinase (PFKA) catalyzes the phosphorylation of fructose 6-phosphate (F6P), 
converting ATP to ADP in the process, 

F6P + PFKA + ATP^X-> F6P * +PFKA + ADP (73) 

where X is a sequence of intermediate compounds that are formed during the process. 
In a reduced model of glycolysis (Goldbeter, A., and Lefever, R., BiophysJ. 12:1302-1315 
(1972)). PFK catalyzes the removal of a phosphate from ATP, 

ATP + PFKA%Y% ADP + PFKA (74) 
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where Y is the intermediate compound formed by PFKA and ATP, and k h k_ x and k 2 
are rate constants. The specialized chemical reaction depiction for (74) can be 

PFKA 

ATPti AD (75) 
[0119] To include the rate constants (75) could be replaced with 

PFKA 

{ATPU ADP,k v k_v (76) 

[0120] Omitted rate constants can take on default values. In addition ADP also activates 
PFK, ATP is continually produced, and ADP is continually degraded. In chemical notation, 

^ATP (77) 
adpX (78) 

2ADP + PFKH PFK (79) 

[0121] The canonical form for conversion of A to B can be a -» B . To describe (77) and 
(78), preferred embodiments of the methods of the present invention such as Cellerator™ use 
the special symbol 0 to indicate conversion to (annihilation) or from (creation) the empty set. 
A bi-directional arrow (0) can indicate that the reaction can proceed in both ways, as in 
equation (79). 

[0122] The reactions, once expressed in canonical form, can then be translated to 
differential equations according to the law of mass action using the rules described above for 
interpretation of canonical input forms of chemical reactions as differential equations. This 
interpretation can be initiated by a user for example, by calling an interpret function that 
is programmed to perform the rules described above. The complete set of canonical forms 
for the simplified glycolytic model, along with the corresponding set of automatically 
generated differential equations, is illustrated in Figure 3. When the same chemical species X 
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occurs in multiple reactions, one term (or set of terms) can be generated in the differential 
equation for )C{i) for each reaction. The final interpreted differential equation then gives the 
sum of all these reaction terms (see, for example, the differential equation for ADP in figure 
3. 

[0123] The methods of the present invention can be used to study biological networks that 
are important for the development of an organism. For example, by extending the standard 
signal transduction network hierarchy so that nodes represent cells rather than chemical 
concentrations we can describe multi-cellular systems such as plant shoot apical meristems 
(SAMs). Links then refer to intercellular - rather than intermolecular - interactions. The 
essential physiology of many developmental processes can be captured by only a small 
number of cells. Thus, the complexity of multi-cellular networks arises from the number of 
interactions rather than the number of nodes. This is because many different mutually 
interacting signal transduction networks need to be represented within each cell. There will 
be many instantiations of essentially the same network (e.g., mitotic oscillators) in each cell, 
or even multiple instantiations of the same network (e.g., MAP-kinase cascades or 
transcription complexes) within a single cell. The 215-cell SAM illustrated in FIG. 4, for 
example, has 976 near-neighbor links (out of a possible 215 x 214 = 46,010 cell-cell links). 
Using the minimal developmental system presented below, this SAM is described by 1690 
ODEs. Birth and death processes change the total number of cells - and hence the total 
number of differential equations required to describe the system - and thus pose an even 
more difficult problem. These must be dealt with as a time-dependent variable structure 
system (VSS). 

[0124] The paradigm of organisms-as-graphs can represent many of the basic features of 
developing tissue. Furthermore, the methods of the present invention can be extended to 
utilize a variable-structure graph-based algorithm to describe simple developmental 
processes. In particular, a VSS can be implemented using a pre-packaged fixed-structure 
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differential equation solver, according to the methods of the present invention, as described in 
further detail hereinbelow. 

[0125] An automated method for simulating a developmental process of an organism 
according to the present invention can include the following steps: 

a) receiving initial condition values and process parameters for the developmental 
process of the organism; 

b) representing the organism or a tissue within the organism by a graph data 
structure, wherein the graph data structure includes: 

i) a list of links, each link representing the interaction between two cells; 

ii) a lineage tree recording the family tree of cell birth for the cells represented 

by the list of links; and 

iii) a list of nodes, each node representing a cell of the cells represented by the 
list of links, with an embedding describing the location of the cell in Cartesian coordinates 
and a set of differential equations describing the time evolution of the location of the cell, the 
differential equations including the initial condition values and process parameters, each node 
having a model that includes a system of differential equations and associated parameters 
describing the developmental process; and 

c) repeatedly solving the set of differential equations in a series of steps for a 
defined number of steps, wherein after each step results are generated and compared to a 
threshold to determine whether the developmental process has reached a trigger point for 
changing the number of nodes in the list of nodes, thereby simulating the developmental 
process. 

The method can preferably include: 

d) graphing the nodes using the Cartesian coordinates. 

[0126] The developmental process can be, for example, cell division, wherein reaching the 
trigger point adds a new node to the list of nodes. 
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[0127] The developmental process can be, for example, cell death, wherein reaching the 
trigger point removes node from the list of nodes. The initial condition values and process 
parameters are typically input by a user before being received by a computer system 
performing a method according to the present invention. 

[0128] To provide an object-oriented representation of organisms-as-graphs we have 
developed the concepts of domain and field. A domain, in analogy with the standard use of 
the term in mathematics, is an object-oriented representation of the mathematical domains 
that are relevant to solving a particular scientific problem. Afield is a representation of a 
function that maps domains to the real numbers. We instantiate a domain by applying the 
expand function to it. 

[0129] For example, we can define the integer domains /(«) , /(«,«), and i(m 9 n, P )to 
represent the sets {t,2,...,»}, $n,m+\ 9 ... 9 n} 9 and $n 9 m+p,m+2p 9 ... 9 m +gp} , where q is the largest 
integer such that m +qp<n, respectively. Then the expand function, which we will denote by E 
herein, maps the object representation of the domain to an implementation of that domain, 
e.g., 

E [/(»)]= {1,2,. ..,«} 

E [I(m,n)] ={m 9 m+lm+2 9 ,.. 9 n} (80) 
E|7(m,«,/?)]= {m,m+ p 9 m+ 2p 9 ...,m+ qp} 

[0130] Thus the expand function applies the implementation. In actual practice we would 
generally never have to deal with the expanded function itself. To see what a field is, suppose 
g is any function that operates on the integers. Then the field operator, which we will denote 
by F herein, 

F:D->£>xR (81) 
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associated with any function /(*):/>-► R (here R denotes the real numbers) generates a 
set of ordered pairs 

F (/,^)= I* ^} ( 82) 

[0131] We will use the shorthand f(D) for the set 

f(D) = {ft | (*„/,) (/,£>), 4 eZ>} = {/(</,) | <f, €£»} (83) 
which we will call the range of the field corresponding to/ 

[0132] To illustrate the concept of fields, suppose that D= /(5, 19,3) and that /(*) = x . Then 

E[D]= {5,8,1 1,14,17} (84) 

and 

F [f,D] = {(5,25),(8,64),(1 1, 1 1 1),( 14196),(17, 289)} (85) 

J[D]= {25,64,111,196,289} (86) 

[0133] Using fields one can also define composite functions that map domains to the real 
numbers. For example, let A c and /X -» <n be a function. Then we define the sum function 
s(/,i4):S[flt] (where spy] is the subset operator that gives the set of all subsets of the set U) 

[0134] The sum function applies/to every element of A and then sums the result; equation 
(86) might be read as "the sum ofJ(x) over the set A." IfD is a domain and we let A = e [D] we 
can define the sum operator - which gives the sum of the function/*) over a domain D - as 

I d /X*)-£(/,e [£>])= Z ;ceE(D) /W = E^ /(Z)) ^ (88) 
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[0135] For example, letting g be the identity function (g(x) = x), 

[0136] We next define afield of domains as a function that maps domain elements to 
domains. Generalizing equation (81), let^ be a set of domains 

A={D [ ,D 2 ,...} (89) 

and let f.d->a be a function that maps domain elements to domains. Then we define 
the field of domains F :D~>D *a associated with F as the set of ordered pairs 

T(F,D)={(d l9 F(P))\di*D} (90) 

and write 

F[D)={f(di)\di^D} (91) 
for the range of the field of domains corresponding to F. 
[0137] Fields of domains are particularly useful because they can be defined dynamically. 
In a developmental simulation, for example, domains can be used to represent sets of cells. 
We will then often need to answer the following question: given any cell in an organism, 
what other cells interact with it? We can define a domain T (for "Tissue") to represent a 
collection of cells, possibly an entire organism, and let a= S[7> {7\,r 2 ,...} be the set of domains 
generated by considering all possible combinations of cells in T. Then the elements t gT 
represent cells of T. Each t has associated with it a set of other cells 
Nbjt] s {t t eT\t,ti are neighbors} eA. The mapping N;teT-+ Nbr[t] zA gives us a neighborhood 
function that tells us which cells are neighbors of other cells. The corresponding field of 
domains N :T->TxA formally defines this as the set of ordered pairs N [t] = {(t,N(t))\teT} , or 
more concisely, the range of N is expressed as # <r)= {N(t)\t <=T} . It is possible to implement 
the neighborhood field of domains via a potential function operator v(t h tj) that is nonzero 
only when there is some interaction between the two cells. We want N [t] to give us the 
ordered pair (r, N[t]) , where 
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N[t] = {tjzT\V(tJj)*0} (92) 

[0138] We can now proceed to define dynamic operators. Let g:$ [t;-]} ~>r be a function. 
Then by generalizing our sum function (see equation 49), we can calculate a sum over 
neighbors operator £ N g as 

Zjv W rffl=%,E[N [t]]) = Z neN[t] g(n) (93) 

to give the sum over all neighbors of t of the function g. To illustrate the sum over 
neighbors, suppose that i/^t f u) is the electrostatic potential between cells t and u. The total 
electrostatic ^(0 potential at t is 

wW -I**i M *'" ) (94) 

where //picks out those neighbors u of t that are not electrically shielded from /. 
[0139] Domains can be implemented, for example, with uninstantiated Mathematica® 
functions; the expand function is perform using up-values, e.g., the integer domain /(»?,«) is 
defined as 

expand [intDomain [m_, n_] ] A : = Range [m, n] (95) 
with similar definitions for the other domains. The notations of pure function, Apply, 
and Map very nicely fit into the concept of fields and operators on domains. However, this 
formal presentation gives us a mechanism whereby domains and fields could be implemented 
with any computer language, freeing us from the specifics of Mathematica®. 

[0140] In another aspect, the present invention provides an organism-as-graph approach to 
simulate a growing organism. For this approach a growing organism (or more likely, selected 
tissue within a growing organism) is represented by a graph data structure. In one preferred 
embodiment, a graph is composed of three things: a list of nodes, a list of links, and a lineage 
tree. Each node represents a particular cell; each link represents the interaction between two 
cells; and the lineage tree records the family tree of cell birth. The overall object hierarchy is 
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illustrated in FIG. 5. The shoot apical meristem illustrated in FIG. 4 gives an example of the 
spatial orientation of the physical graph represented by this data structure. 

[0141] For example, a graph domain according to preferred embodiments of the present 
invention, can be represented as 

g = graphDoma in [nodes— >{nl ,n2 ,...} , /n£\ 
links -»{ll,l2,...} ,lineage-»T 

[0142] Each node contains precisely one embedding and one or more models, 

nl = nodeDomain [embedding el /Q7\ 
models -»{ml,m2}] 

[0143] The embedding describes the location of the cell in Cartesian coordinates, and an 
optional set of differential equations (and corresponding initial conditions) that describe the 
time evolution of those coordinates. In the simplest implementation only a single point is 
needed to describe a node's embedding; however, there is no reason that additional 
information, such as the shape of a cell and its geometric relationship to other cells, could not 
be included. 

el = embeddingDomain[ 
position ->{xl,yl,zl}, 

nodes ->{xl' = fl[xl,yl,zl], } ' 
ic -> {x 1 0,y 1 0, . . . } ,time -> tO] 

[0144] The time indicates when the initial conditions are applied. 

[0145] Each node can contain one or more models. Each model domain contains a system 
of differential equations and associated parameters that describe some aspect of that cell's 
signal transduction network. In theory, one could put all of the differential equations for a 
cell in a single model. In preferred embodiments, however, different models are generated for 
distinct parts of the network. The differential equations in one model can refer to variables in 
another model (in fact, they can refer to variables in another node, as well). For example, to 
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describe cell division it might be convenient to group the equations into separate models 
representing the Gl checkpoint, the G2 checkpoint, DNA replication, Mitosis, etc. For 
example, if we want the glycolytic system r illustrated in FIG. 3, all that is necessary to 
specify is 

{ eqns , vars } = interpret { r } ; 
rn = modelDomain [ 

molecules -> vars , 

nodes -> eqns] ; 

[0146] Options that are not specified take on default values; for example, unspecified 
initial conditions and the initial time are set to zero. If a number of nodes n h ...,nk were 
present, each of which required a glycolytic system, indices could be added to the variables, 
i.e., ADP becomes ADP [i] , i=l,„Jc. One of ordinary skill in the art recognizes that this 

process is easily automated. 

[0147] The minimum information contained in a link is a reference to two nodes, such as. 
11= iinkDomain[nodes -» {i,j}] . In preferred embodiments each node is numbered as it is added 
to the graph, so the nodes are represented by those integers. Pointers could also be used. 
Additional information about the link can also be placed in the data field. Each link may also 
contain an associated "spring" field that represents the dynamics of cell growth. These 
springs are used to define a potential function (FIG. 6) that is used to optimize the position of 
the nodes after each growth step. The potential function for a node is 

V^^LfyQxt-XjhdJ (100) 
where the sum is taken over all nodes nj that are linked to n h x,- are the vector 
positions of the nodes, the are interaction strengths (zero for non-interacting nodes), and 
the give the desired distance between the nodes. The potential gets "turned off* when the 
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interaction distance becomes too large in some sense; to prevent a discontinuity in the 
potential after cell division we modify (100) as: 

V,j = ^Z,W<l*f-«; Mff-d < 101 ) 



where 

C ..-M* rf (102) 

[0148] The biological dynamics of growth are then described by assigning a relationship 
between dy and the model variables in nodes «,• and «,-. For example, suppose that the model 
stipulates that a cell's mass grows at a constant rate, 

4&=kM, (M^Mne*) (103) 

at 

and that we describe the cell as a sphere of radius % . Then if the density remains 
constant, 

f = (104) 

then we might constrain the equilibrium spring distance to be 

d^m + Rj (105) 

if the cells are assumed to be touching. After each step, the potential is then 
minimized (e.g., via gradient descent or simulated annealing) to determine the new position 
of the cell. For a small number of nodes the optimization can be replaced by adding an 
additional differential equations of the form 

dt 3 

[0149] This forces the solution to follow a gradient descent towards the minimum of the 
spring function. As a final note, the term "spring" here is used because of the form of the 
potential function, and does not actually describe position dynamics (otherwise, equation 106 
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would be second order in time). In fact, equation 106 gives an exponential relaxation 
towards local minimum, akin to the motion of a spring in a highly viscous medium. 

[0150] The methods of the present invention can be used to simulate variable structure 
systems. Biologically, the birth of new cells and the death of old cells occur as a result of the 
concentration of some chemical species passing a species. For example, the amount of ATP 
can fall so low that metabolic processes cease (death) or the quantity of S-phase promoting 
factor (SPF) may increase so high that DNA replication is induced (birth). In the first case, 
the number of active cells in the system ceases to exist; in the second case, an additional cell 
is added. 

[0151] It is easier to represent death than birth. We can assign an indicator, or "aliveness " 
variable i k to each cell k, where i k = l if cell k is alive, and i k = o if cell k dies. Then if we 
multiply the concentration of each chemical species x in cell k by i k , all equations for a given 
cell effectively disappear when the cell dies. In principle we could do the same thing to 
describe birth. However, this would require that we know in advance the total maximum 
number of cells we would ever want to simulate. We would never be allowed to exceed this 
number. Besides its inelegance such an algorithm could be a computationally very 
expensive. Thus we propose an alternative mechanism to represent cell birth. 

[0152] We assume that any change in the size of the system (i.e., the number of variables 
and differential equations and/or the values of corresponding initial conditions) is triggered 
by some threshold passage. Suppose that our system is composed of M+N variables, 
{*i}i=U,# and Wi=i,«.tf9 where each of the N variables Xi can trigger some event when it 
passes a corresponding threshold 25 , and there are M remaining variables w t in the total 
system. Then we define a set of flag variables fa ,**},•= where there is one pair of flag 
variables for each trigger variable Xi . Then if the event is triggered the first time that Xi > T t 
we force the corresponding flag variables to satisfy the ODEs 
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m. = @ {xi -T i ),y i (0) = 0 



(107) 



dt 



=efyi), z ( (0)=o 



(108) 



where ®(x) is the unit step function 



fo, *<o 



(109) 



[0153] A similar equation can be written if the event is triggered by x t < T t . Then yi 
increases linearly from zero whenever x t is above threshold. Even if falls back below 
threshold, yi will remain positive; it stays fixed at whatever value it had when falls back 
below threshold. The second new variable z { increases linearly with time whenever yi is 
positive. Thus z { measures the total amount of time that has elapsed since x,- first passed its 



[0154] These new variables are then used as follows. We submit our entire system 

^{n}i=i.^M to a fixed-structure solver for some time interval of length T. Here 
T is the total duration of numerical integration, and is not the integration step size, which is 
typically (several) orders of magnitude smaller. We chose T to have a magnitude close to the 
expected time between triggering events (e.g., the length of the cell cycle). Presumably the 
solver will return the solutions to the differential equations (in the form of interpolatable 
functions of some sort) over the time interval (0,r>. Then we evaluate each of the variables z t 
at the end of the solution interval (time t = T). If z,-(7) = 0 v/= u,tf then no even occurred. 
Otherwise some event has occurred. Since it is possible that more than one of the z t are 
nonzero we must examine them all to determine which one is the biologically meaningful 
one, i.e., which event occurred first. To determine this we calculate 



threshold. 



- max Z[ 

i<i<N 



(110) 



Then x k is the first variable that passed threshold, and this event occurred at time 
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t = T-z k (HI) 

[0155] So we accept the solution provided over the interval (0,r- z k ) but reject the rest of 
the solution, over the interval (T-z k j).. We define a new set of initial conditions at T-z k by 
evaluating the numerical solutions for x t and w ( at that time, and setting y t (T-z k )^0 and 
zt( T- z k ) = o . We then add whatever new variables are necessary to the system at T -z k (along 
with their corresponding differential equations and initial conditions). If some of these new 
variables can trigger events we must also add whatever new yi and z t variables are required 
to describe these new potential event triggers. Then we have a new system with a larger 
number of variables. The entire process is then repeated over the interval (T~z k , lT-z k ) and is 
iterated as desired. 

[0156] The following examples are intended to illustrate but not limit the invention. 

EXAMPLE 1 

SIMULATION OF MAPK PATHWAY WITH SCAFFOLDS 

[0157] This example provides a method for automatic model generation of a dynamic 
regulatory network, the mitogen-activated protein kinase (MAPK) cascade signal 
transduction module operating in solution or when bound to a scaffold. 

[0158] The mitogen-activated protein kinase (MAPK) cascades (Fig. 1) are a conserved 
feature of a variety of receptor mediated signal transduction pathways (Garrington, T.P. and 
Johnson, G.L., Curr.Opin.CellBiol, 11, 21 1-218 (1999); Widmann, C, et al., PhysioLRev. 
79, 143-180 (1999); and Gustin, M.C., et al, Microbiol Mol Biol Rev,, 62, 1264-1300 

(1998) ). In humans they have been implicated in transduction of signals from growth factor, 
insulin and cytokine receptors, T cell receptor, heterotrimeric G proteins and in response to 
various kinds of stress (Garrington, T.P. and Johnson, G.L., Curr.Opin.CellBiol 11, 211-218 

(1999) ; Putz, T., et al., Cancer Res, 59, 227-233 (1999); Sternberg, P.W. and Alberola-Ila, J., 
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Cell 95, 447-450 (1998); Crabtree, G.R. and Clipstone, N.A., Annu. Rev. Biochem. 63, 1045- 
1083 (1994); and Kyriakis, J.M., Biochem. Soc. Symp. 64, 29-48 (1999)). A MAPK cascade 
consists of three sequentially acting kinases. The last member of the cascade, MAPK is 
activated by dual phosphorylation at tyrosine and threonine residues by the second member 
of the cascade: MAPKK. MAPKK is activated by phosphorylation at threonine and serine by 
the first member of the cascade: MAPKKK. Activation of MAPKKK apparently proceeds 
through different mechanisms in different systems. For instance, MAPKKK Raf-1 is thought 
to be activated by translocation to the cell membrane, where it is phosphorylated by an 
unknown kinase. All the reactions in the cascade occur in the cytosol with the activated 
MAPK translocating to the nucleus, where it may activate a battery of transcription factors by 
phosphorylation. 

[0159] MAPK cascades have been implicated in a variety of intercellular processes 
including regulation of the cell cycle, apoptosis, cell growth and responses to stress. These 
molecules are of crucial importance in the development of memory and wound healing. 
Abnormal changes in MAPK pathway regulation often mediate various pathologies, most 
notably cancer. This central role of MAPK mediated signal transduction in most regulatory 
processes makes it an especially attractive research and modeling object. 

[0160] Signal transduction through a MAPK cascade can be very inefficient unless 
additional regulatory proteins, called scaffolds, are present in the cytosol. Scaffold proteins 
nucleate signaling by binding two or more MAP kinases into a single multi-molecular 
complex. It has been reported previously that scaffolds can both increase and decrease the 
efficiency of signaling in a concentration dependent manner (Levchenko, A., et al., Proc. 
Natl Acad. Set USA, 97(1 1):5818-23 (2000), incorporated by reference herein in its 
entirety). In addition they can reduce the non-linear activation characteristics of the cascade. 
These properties may be crucial for global and local activation of MAPK as scaffold proteins 
may selectively translocate to small subcellular compartments, thus locally facilitating or 
inhibiting MAPK activation. This Example demonstrates the use of a software package 
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called Cellerator™ to perform a method of the present invention to substantially improve 
earlier MAPK cascades models and study the parametric dependence of the MAPK cascade 
in a manner not investigated in the preceding report. 

[0161] As described above, addition of scaffold proteins into the MAPK reaction system 
results in markedly increased number of states and equations describing transitions between 
them. Here the benefits provided by the methods of the present invention can be appreciated, 
as a simple sequence of commands can lead to automatic description of all reactions 
involving scaffold-kinase complexes (see FIG. 14). 

[0162] In the simulation performed in this Example the first goal was to verify the 
automatic model generation for scaffold-mediated MAPK cascade as implemented using a 
method according to the present invention. As a basis for the comparison we referred to a 
previous report describing a quantitative model of the effect scaffold proteins can play in 
MAPK mediated signal transduction manner (Levchenko, A., et al., (2000), incorporated 
herein by reference in its entirety). The same parameter values were used for the analysis as 
those used in Levchenko et al. (2000) (shown in Table 4 according to the parameter 
designations of Levchenko et al (2000)). 

Table 4, Parameter values used in the model (unless otherwise stated) 



Parameter J 


Value assumed* 


Concentrations, ^M 


[MAPKKK] 


0.3 


[MAPKK] | 


0.2 j 


[MAPK] 


0.4 i 


[MAPKKK K-ase] 


o.2 : 


[MAPKKK P-ase] 


0.3 


[MAPKK P-ase] 


0.2 


[MAPK P-ase] 


0.3 


Association rate constants, \i M'^sec" 1 


ai \ 


1 \ 


&2 ; 


0.5 \ 


0j 


3.3 
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a 4 1 


10 


as 


3.3 i 


a 6 


10 


a? 


20 


as 


5 \ 


a 9 j 


20 


aw ! 


5 


on i 


10 


on 2 


10 


Dissociation rate constants, sec 


di 


0.4 


d 2 


0.5 j 


d 3 


0.42 


d 4 ! 


0.8 | 


ds 


0.4 j 


de 


0.8 


d 7 


0.6 


d 8 


0.4 


d 9 1 


0.6 


dw J: 


0.4 1 


off 


0.05 


off 2 


0.05 j 


off, 


0.05 ! 


off 4 


0.5 


Reaction rate constants, sec" 1 


ki 


0.1 


1 k 2 


0.1 


\ k 3 


0.1 


I k 4 


0.1 


1 k 5 


0.1 


\ h 


0.1 \ 


I k 7 


0.1 


1 k 8 


0.1 


I k 9 


0.1 


! k w 


0.1 i 



*The concentrations and individual a, d, and k values correspond to estimates in 
Ferrell, J. E., J, Trends. Biochem. Sci. f 21, 460-466 (1996) and Bray, D. & Lay, S., Proc. 
Natl Acad, Set USA 94, 13493-13498 (1997). 



GT\6265491.1 
104662-60 



ATTORNEYDOCKET NO: 
CIT1440-1 



60 



[0163] Additionally, all the assumptions of the model of Levchenko et al. (2000) were 
made for this analysis. These assumptions included 1) dephosphorylation of kinases in 
scaffolded complexes is precluded; 1) there is no binding of partially or fully activated 
MAPKK or MAPK to the scaffold; the activation of MEK and MAPK when bound to a 
scaffold is processive rather than distributive with the reaction rate equal to the rate of a 
single phosphorylation reaction; 3) kinases bind to the scaffold independently of one another, 
i.e., there is no cooperativity in the binding; 4) scaffold molecules do not possess catalytic 
properties, so that the reaction rates within a scaffold complex and in solution are equal; and 
5) reactions take place in a homogenous environment with no additional mechanism for 
compartmentalization of molecules, thus we ignored MAPK translocation to the nucleus. 

[0164] A representation of the MAPK pathway was input into the Cellerator™ platform in 
the following manner. The following command was entered to instruct Mathematica™ to 
load Cellerator™ into memory: 

Get [ToFileName [ 

{"PikachusRescue'.Research", "CelleratorFiles", "cellerator.m M }]J; 

[0165] The following input command was entered to generate the initial set of reactions 
for a MAPK cascade on a Scaffold and assigned the list of reactions to the variable c: 

c =MAPKCascade [ 
signal -» RAFK, 

phosphatase {MAPKP, MEKP, RAFP}, 
stages — > {2, 2, 1}, 

solutionSignalRates — > {al, dl, kl, a2, d2, k2}, 

solutionPhosphorylationRates — » 

{{a3, d3, k3, a4, d4, k4}, 8 a5, d5, k5, a6, d6, k6}, 

{a7, d7, k7, a8, d8, k8}, 8 a9, d9, k9, alO, dlO, klO}}, 

scaffoldName — > S, 

scaffoldSignalRates -> { kla, dla, kl } , 

scaffoldPhosphorylationRates ->{{ k7, k9a}, {k3, k5a}}] 

[0166] The output including the initial set of reactions is shown in FIG. 17. 
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[0167] The following command was entered to generate the Differential Equations and 
assign the list of differential equations to the variable s: 

s = interpret [c] 

[0168] The output including the differential equations is shown in FIG. 18. 

[0169] When the same assumptions were made as those of Levchenko et al. (2000) 
(described above), exactly the same solution for the three-member scaffold case was obtained 
using the automated methods of the present invention (FIGS. 17-18). Note that the equations 
for a two-member scaffold were provided in Levchenko et al. The case of a three-member 
scaffold is modeled similarly, with the number of equations describing scaffold complexes 
tripled because of increased combinatorial possibilities for complex function. This 
convergence of results verified the model generated by a method according to the present 
invention. In addition, the difficulty of manual generation of all the necessary equations, a 
limiting factor of the previous study, has now been removed. 

[0170] We thus attempted to study a more detailed model, in which some of the previous 
assumptions were relaxed. The following syntax and commands provides examples of syntax 
and commands entered by a user to direct Cellulator™ to numerically solve the differential 
equations (i.e. to initiate the equation solver function of Cellerator™. Similar commands and 
syntax were used to generate the results provided later in this Example. 

[0171] The following commands were used to assign values to the rate constants (the 
input parameters): 

al =1.; a2 =0.5; a3 =3.3; a4 =10.; a5 =3.3; 

a6 =10.; a7 =20.; a8 =5.; a9 =20.; alO =5.; kon =10; 

kpon =0; dl = 4; d2 = 5; d3 = 42; d4 = 8; d5 = 4; 

d6 = 8; d7 =.6; d8 = 4; d9 = 6; dlO = 4; dla =0; koff = 05; 

kpoff =koff; kl = 1; k2 =1; k3 = 1; k4 =1; k5 = 1; k6 = 1; 

k7 =1; k8 =1; k9 = 1; klO =1; kratio =1; kla =100; 
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k9a =kratio *k7; 
k5a=kratio *k3; 

[0172] The following commands and syntax will solve the system of differential equations 
using the above parameters and initial conditions as specified following the arrow 
"initialConditions", and will plot all variables. The result will be returned as a list of 
Mathematica® "interpolating functions" that are assigned to the variable sol. The 
interpolating functions will contain predicted values of the function for a "timespan"of 500 
seconds (e.g., an interpolated table of values): 

sol =run [s, 
initialConditions — ► 
{K[2,0][0]== 0.2, 
K [3, 0] [0] == 0.3, 
K [1, 0] [0] == 0.4, 
RAFP [0] == 0.3, 
MEKP [0] == 0.2, 
MAPKP [0] == 0.3, 
S [-1, -1, -1] [0] ==0.1, 
RAFK [0] == 0.1 
}, 

timeSpan— > 500, 
plotVariables — > All] 

[0173] The following function gives an example of how a single parameter of interest, the 
integrated total concentration of a particular molecule over some time period, e.g., 100 
seconds, can be calculated from the above interpolatingFunctions. First, define the function 
that is to be evaluated: 

integratedOutput [t__, q_J : 
(° n+i First [Evaluate [K [1, 1] [time] /.q] dltime 



[0174] Finally, evaluate the function on the Cellerator™ solution: 
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integratedOutput [100, sol] /. ton -» 0 
[0175] The output generated for this exemplary input is 0.827749. 

[0176] The use of Cellerator™ using syntax and commands similar to those described 
above, allowed us to perform systematic sensitivity analyses of the assumptions made in our 
description of the role of scaffold proteins in MAPK cascade regulation (Levchenko etal., 
2000) using commands and syntax similar to that described above. We previously described 
dual MAPKK and MAPK phosphorylation within the scaffold to proceed as a single step 
(processive activation). This is substantially different from a two-step dual phosphorylation 
sequence occurring in solution. In this distributive activation, the first phosphorylation event 
is first followed by complete dissociation from the activating kinase and subsequently the 
second phosphorylation reaction occurs. The assumption of processive phosphorylation in 
the scaffold has some experimental basis. Mathematically, it is equivalent to assuming that 
the rate of the second phosphorylation reaction is fast compared to the first reaction. 
Although this assumption was partially relaxed in our previous report, no systematic study of 
relaxation of this assumption has been performed. Using the Cellerator™ software to run a 
method of the present invention, we performed a systematic investigation of the role of 
increasing or decreasing the rate of the second phosphorylation within the scaffold compared 
to reactions in solution by using the rates from Table 4, and increased or decreased by a 
factor of 1000. The results for the case when the two rates are equal are presented in Fig. 7. 
It is clear that relaxation of this assumption results in a substantial decrease of efficiency of 
signal propagation. 

[0177] Similar simulations were performed to investigate the effect of allowing formation 
of a complex between MAPKKK in the scaffold and MAPKKK-activating kinase, as well as 
the effect of allowing phosphatases to dephosphorylate scaffold-bound kinases. In all cases 
the parameter values used in simulation are equal to those used for corresponding reactions in 
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solution (for the full list of parameters see Levchenko et al., 2000). The results are presented 
in Fig. 7. Again, new assumptions resulted in substantial down-regulation of efficiency of 
signal propagation. It is of interest that the position of the optimum scaffold concentration 
(i.e., the concentration at which the maximum signaling is achieved) is insensitive to making 
these new assumptions. This agrees with the analysis in (Levchenko et al 9 2000), which 
suggested that the position of the optimum is determined only by the total concentrations of 
the kinases and their mutual interaction with the scaffold. 

[0178] The results presented in this Example show that automatic model generation 
according to the present invention can simplify the transition from an informal, cartoon-based 
description of a reaction pathway, or a network of pathways, to a system of differential 
equations. This transition is obtained via a rigorous description of enzymatic kinetics and 
other biochemical processes and is implemented utilizing symbolic translation. In addition to 
facilitating the potentially burdensome task of correctly writing out all of the necessary 
equations, this methodology provides an explicit and flexible way of controlling successive 
stages of model creation. Furthermore, user intervention is possible both at the stage of 
conversion of an informal pathway description into a set of chemical reactions and at the later 
stage of mapping these reactions to the corresponding mathematical forms. This flexibility 
increases the ability of the user to participate in building and modifying the model at a level 
limited only by his or her expertise. 

[0179] We have demonstrated the automatic generation of symbolic differential equations 
using a generic three-member scaffold, the MAPK cascade mediated signaling system. The 
implementation of software that performs methods of the present invention, as demonstrated 
in this Example, called Cellerator™, is capable of generating and solving these 101 
differential equations, a task not achieved in the previous detailed study of the effect of 
scaffolds. Such automated model generation will prove especially useful in describing even 
more complex biochemical reactions that involve the formation of multi-molecular 
complexes. Such complexes may exist in numerous states, each requiring a corresponding 
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equation for its dynamical description. Because of the combinatoric expansion of reaction 
possibilities, correctly writing out all of these equations by hand rapidly becomes impossible. 

[0180] One of ordinary skill in the art based on the present disclosure, will recognize that 
methods of the present invention can be used to analyze the role of scaffolds in signal 
transduction regulation. For example, extended indexing can be used to specify reactions 
occurring in various sub-cellular compartments. This facilitates the study of the effect of 
scaffold translocation to the cell membrane observed in gradient sensing and other important 
regulatory processed. In addition, the algorithm can be developed to allow for scaffold 
dimerization, an experimentally observed phenomenon. 

[0181] The computer interface, Cellerator™, which was used in the present Example to 
perform a method according to the present invention to model events in a linear pathway 
mediated by sequential covalent modification, can be made more universal to include other 
canonical forms and variable structure systems. For example, the computer interfaces can be 
adapted to study the NF-kB and PKA pathways. Consideration of these pathways will 
necessitate implementation of the elementary reactions describing transcription, translation 
and protein degradation. In addition, complex formation will be considered as a high level 
reaction leading to an activation step within the pathway. 

EXAMPLE 2 

SIMULATION OF A DEVELOPMENTAL SYSTEM 

[0182] This example provides illustrates the use of the methods of the present invention 
for automatic model generation of a developmental system. The minimal developmental 
system can be illustrated using Goldbeter's minimal cascade for mitotic oscillations 
(Goldbeter, A., Proc. Natl Acad. Set USA, 88:9107-1 101 (1991)). It should be noted that any 
system of differential equations can be used here, so long as there is at least one threshold 
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that will trigger cell division. This particular system was chosen for illustrative purposes 
because of its elegant simplicity. The differential equations are 

""•-fie-* (U2) 

(U3) 

*-™*i£fci- v <-& (114) 

where C, M, and X represent Cyclin concentration (Q and the fractions of active 
CDC2 kinase (M) and cyclin protease (X), respectively, with the additional constraint that 

(u5) 

[0183] All other parameters in the equations are constants. 

[0184] The first differential equation (1 12) is straightforward; the first and third terms are 
creation and annihilation of C, 

{0^C,vi,kd} (116) 
while the middle term is a hill-function annihilation of C catalyzed by X: 

{Ci-^0,hill[vmax^ vd,khalf ^kd] } (H?) 

[0185] Equations (113) and (1 14) use implicit mass conservation; for example, the first 
term in equations 1 13 is 

{Comp [M] !-> M/hill [ vmax VI ,khalf -» Kl] } (1 1 8) 

and so forth for the remaining reactions. The constraint (115) is enforced utilizing 
Mathematical replacement rules. The entire set of reactions is illustrated in Figure 8. 
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[0186] The Cellerator™ run function will automatically perform a simulation 
(numerically solve the ODEs using NDSolve) and plot the resulting concentration profiles, as 
illustrated in Figure 8. It is not necessary to interpret the reactions first; this is done in Figure 
8 merely for illustrative purposes. To perform a simulation a graphDomain with appropriate 
initial conditions (cell geometry, initial concentrations, ODEs) for the system of interest is 
first created. The initial graph domain for a single-celled system is illustrated in Figure 9; the 
initial geometry for a multi-cellular system is illustrated in Figure 4. The second model 
Domain in Figure 9 implements the variable-structure threshold described hereinabove, with 
variables splitf lag and tsplit representing yy and z x respectively. The system can 
then be repeatedly integrated using NDSolve, testing the values of the flags after each step. 
Whenever a threshold is passed, a new node is added to the graph. The concentrations of C, 
M 9 and X are randomly split between parent and child cell after each cell division so that the 
total number of molecules remains fixed before and after cell division. Figure 10 shows a 
snapshot of the graph when the organism has grown to twenty cells. (At the point illustrated 
in Figure 10 the system had 354 links and 180 ODEs.) Each time a cell divides, a new cell 
number (equal to N+l where N is the number of cells in the graph just before cell division) is 
assigned to the child cell while the parent retains its old cell number (the first cell is number 
1). The corresponding node on the lineage tree is replace by a binary subtree 
tree[parent,child]. The final lineage tree is illustrated in figure 1 1 . A different simulation 
would produce a differently shaped organism because of the nature of the random number 
assignments. 

[0187] This Example presents a graph-based methodological paradigm for computational 
simulations in developmental biology. Multi-cellular systems (organisms) are represented as 
graphs whose nodes and links represent cells and intercellular interactions, respectively. This 
approach has a natural hierarchical generalization that makes it very amenable to multi-scale 
analyses: when more detail is needed, the data stored in a graph node can be expanded into a 
graph representing the intracellular signal transduction network (STN) of that cell. Nodes on 
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the STN are themselves progressively more detailed signal transduction sub-networks, and so 
forth, down to a single molecular species, if so desired. Because it would be pointless to try to 
deterministically simulate every chemical species in every cell at every time such a multi- 
scale approach becomes essential. 

[0188] All of the references cited herein are incorporated by reference. Although the 
invention has been described with reference to the above examples, it will be understood that 
modifications and variations are encompassed within the spirit and scope of the invention. 
Accordingly, the invention is limited only by the following claims. 
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